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A method of processing print data allowing for rendering 
bands of print data in parallel— A main processor (52) of a 
single-chip multiprocessor converts an incoming page of 
print data i nto paths . The paths are then converted to 
pr imitives and th e primitives are rasterized usin g parallel 
processor (60, 62, 64, 66). The parallel processors (60, 62, 
64, 66) work in concert with the main processor (52) such 
th at bands of the final print image^ are rendered into a frame 
buffer (58) in parallel, allowing for faster and more efficient 
processing of print data. 
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METHOD OF PROCESSING PRINT DATA 
USING PARALLEL RASTER IMAGE 
PROCESSING 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates to raster image processing in print- 
ing systems, more particularly to such processing in a 
system with a main processor and several parallel proces- 
sors. 

2. Background of the Invention 

Modem printing systems use some sort of processor to 
interpret a program representing the image to be printed. The 
program is written in a page description language (PDL) that 
describes the image on the page in a format that the 
processor can understand. The processor must then convert 
the page description into a format that is compatible with the 
actual printing hardware, called a raster bit-map. This is 
typically a two-step process. The first step is interpretation 
and the second is rasterization. 

During interpretation the PDL is parsed and dictionaries 
are searched that allow translation of the PDL operators into 
paths. Paths are sets of graphical objects describing the 
outlines of graphical objects and are accompanied by fills. 
The graphical objects are usually described in terms of 
straight lines and parametric curves, such as bezier curves, 
splines, etc. A fill value is a single solid color that specifies 
the color of the fill area inside the outlines. Sometimes these 
fills occur through masks. In some cases, the fill value is not 
a single color but a scanned image. 

Paths are transformed from source coordinate space to the 
device coordinate space, the device being the hardware of 
that particular printer. The paths are then reduced to poly- 
gons through a piece-wise linear approximation of any 
constituent parametric curves. The polygons are then further 
reduced to lower level primitives, such as trapezoids and/or 
run-arrays. The goal of all of this reduction is to simplify the 
second stage of the process, rasterization. 

Other tasks are also performed during interpretation such 
as color conversion, decompression and outline font pro- 
cessing. Color conversion means converting data from 
source color space to device color space. Decompression is 
only necessary if the incoming image data is in a compressed 
format such as JPEG or LZW, Fonts are usually in outline 
form in which they are described in terms of straight lines 
and bezier curves, and must be converted into bitmap format 
at this stage. 

The second stage of the process is rasterization. In this 
stage, the previously produced graphics primitives are scan 
converted, screened and rendered into the frame buffer of the 
print hardware. This is typically the most time consuming 
stage of the process, due to the high resolution required of 
most printing systems. 

Current systems typically use one processor that has to 
perform all of the above tasks. These tasks are performed 
sequentially resulting in a slow and inefficient printing 
process. Therefore, a system and method is needed to allow 
more efficient and faster raster image processing for printing 
systems. 

SUMMARY OF THE INVENTION 

In accordance with embodiments of the invention, a page 
of print data is processed by a single-chip multiprocessor in 
three stages to convert the print data to a bit map. The term 



52,016 Bl 

2 

single-chip multiprocessor is used to describe the parallel 
processors-useciin Jhe invention T which are all r esident on 
the same semiconductor chip. The main processor may be on 
that same single_chip or may be external to the chip having 

5 all of the.parallel-processors. 

In the first stage, called language interpretation, the page 
description is converted to paths with accompanying fill and 
mask values. The paths and their fills and masks undergo 
geometry processing in the second stage. Geometry process- 

10 ing can be further broken down into two parallel stages: 
boundary processing and source data processing. During 
boundary processing the paths are segmented to low level 
primitives (triangles, trapezoids, etc.) Source data process- 
ing involves operations on fills and masks such as 

15 decompression, color conversion, outline font processing, 
etc. Finally the low level primitives are rasterized into a 
frame buffer, which is then sent to the print engine. In one 
embodiment of the invention, the frame buffer is segmented 
and used to allow the main processor and the parallel 

20 processors to render bands of the final image in parallel. 
In a sort-first embodiment of the invention, the main 
processor creates the paths using some assistance from the 
parallel processors to accelerate the tasks. This is typically 
done by using a task queue, with all of the parallel proces- 

25 sors eligible to assist. The parallel processors then perform 
the boundary processing part of geometry processing to 
convert the paths to primitives and the rasterization to send 
the data to the print engine. In an alternate sort-first 
embodiment, a subset of the parallel processors are dedi- 

30 cated to assisting in the language interpretation of a current 
page and a second subset performs the geometry processing 
and rasterization of the previous page. 
In a sort-middle embodiment, the main processor sends 

35 the paths to geometry processing parallel processors which 
then send the primitives to the rasterization processors in a 
round-robin or queue manner. An alternate embodiment, the 
geometry processing parallel processors perform geometry 
processing for the current page and the rasterization proces- 

^ sors work on the previous page. In a third embodiment, all 
of the parallel processors perform geometry and then ras- 
terization processing on one page. 

BRIEF DESCRIPTION OF THE DRAWINGS 

45 For a more complete understanding of the present inven- 
tion and for further advantages thereof, reference is now 
made to the following Detailed Description taken in con- 
junction with the accompanying Drawings in which: 
FIG. 1 shows a flowchart of a raster image processing 
50 pipeline in accordance with the invention. 

FIGS. 2a and 2b show sort-first implementations of one 
embodiment of the invention. 

FIGS. 3a-c show sort-middle implementations of one 
55 embodiment of the invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

FIG. 1 shows a raster image processing pipeline flow- 
60 chart. Current approaches identify two steps, interpretation 
and rasterization. In accordance with the present invention, 
three steps are identified by dividing the conventional inter- 
pretation step into two steps: language interpretation and 
geometry processing. The functions enclosed in box 10 
65 involve language interpretation, those in box 20 are geom- 
etry processing and those in box 30 are rasterization pro- 
cesses. The raster image processing starts when a PDL is 
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sent to the processor. The processor then converts the PDL the clip window being used as a boundary and only those 

to graphic objects at step 10. This step then generates pixels that fall within that boundary being generated by the 

graphic objects, hereinafter called paths, which move to scan-conversion process. 

geometry processing in box 20. In addition to the graphic Regardless of where clipping is used, the entire process 

objects generated, the fills for those objects are also identi- 5 can be implemented using one processor or multiple pro- 

fied. Fills can be single solid colors 14 or scanned images 16. cessors. However, current implementations have a problem 

Any masks associated with the objects are also identified. with speed and efficiency in the process. Using a single 

Some masks are in a bitmap format 18, others are in an processor results in that processor having to perform all of 

outline form 22 and will be converted to a bitmap in the next the above functions. This obviously leads to bottlenecks and 

step. 10 inefficiencies, especially in the rasterization process. 

All of the resulting data at points 10, 14, 16, 18 and 22 Parallel processing of this data can be done using a single 

then undergoes geometry processing in box 20. The data chip multiprocessor. This type of implementation usually 

going on for geometry processing undergoes either bound- occurs in one of two ways. The first approach divides the 

ary processing in box 20a or source data processing in box frame buffer into several tiles equal to the number of 

206. 35 processing elements (PE) and associates one PE with that 

The graphics objects produced in step 10 undergo bound- tile. This can lead to low processor utilization when few 

ary processing starting at box 24. The graphics objects processors are involved. A second approach is to use a 

which are defined in some source coordinate space are virtual buffer system. In this type of system, the first set of 

transformed to device coordinate space. At step 26 the regions are assigned a like number of PEs. When a PE is 

objects are then converted to polygons for easier handling. 20 done with its region, it picks up the next region in the 

If necessary or desired, clipping of the polygon can be pipeline. This allows for dynamic scheduling and higher 

performed at step 28. Finally, at step 32, the polygons processor utilization. 

created in step 26 are converted to primitives. Graphics This last approach can be adapted for use with embodi- 

primitives can be trapezoids, triangles, run-arrays, etc. Such ments of the present invention. In one embodiment of the 

primitives are easier for typical graphics processors to 25 invention, a page is logically divided into several bands, 

handle than polygons. On a specialized processor, such as Suppose memory is available for the entire page, then in the 

the M VP (TMS320C80, by Texas Instruments) polygons are preferred embodiment the p parallel processors (PPs) are 

also easily handled. This allows a choice not to further first assigned to the first p bands, where p is the number of 

decompose polygons into primitives. However, for this PPs. As soon as a PP finishes generation of pixels for that 

discussion, reduction to primitives is assumed. 30 band it is assigned to the next band to be processed in the 

The data for the fills and masks undergoes source data sequence. After all the bands are rasterized, the frame buffer 

processing. The solid color fills from step 14 must be is transferred to the print engine for printing, 
converted from the source color space to the device color To save memory costs, it may be desirable to assign a 

space at step 34. Source image data that is compressed, such 35 memory with less than the full page buffer size to the 

as in JPEG format, must be decompressed at step 36 and processor. For example, suppose (p+k) bands fit in the 

then converted from source color space to device color space available memory. The memory is divided into (p+k) band 

at step 38. Outline font data is first transformed from source buffers, where p is the number of PPs. The printer can still 

to device coordinate space and then converted to bitmap operate with the lesser memory if the PPs can rasterize fast 

masks at step 42. 4Q enough into the remaining band buffers as one is transferred 

Source data processing can also include transformation of to the printer for printing. In other examples, the number of 

image and bitmap masks from their source to the device buffers (p+k) may be equal to the number of processors p 

coordinate space as shown at steps 39 and 40, respectively. (k=0)» or even less than the number of processors. 
Alternately, such transformations can be performed during A scheduler process running on the MP operates in the 

scan conversion (transformation on the fly) at step 44. 45 following manner. A free band buffer and PP are assigned for 

The rasterization process 30 has two stages: scan each band in sequence. A server process running on the PP 

conversion, and halftoning. During scan conversion, 44, the is allowed to rasterize the corresponding primitives for that 

input graphics primitives which are described in a graphical band into the band buffer. This server process is described in 

form (for example, a triangle is described by its three U.S. patent application Ser. No. 08/957,475, "Embedded 

vertices) is converted to a pixel form and written to the 50 Display List Interpreter," incorporated by reference herein, 

frame buffer. After all the primitives are scan converted, the The scheduler process then waits for the next free processor 

frame buffer is screened/half-toned at step 46. It is also and free band buffer to assign to the next band, 
possible to perform screening during scan conversion as When the server process on the PP finishes its rasteriza- 

discussed further in U.S. patent application Ser. No. 08/941, tion task it interrupts the scheduler process to signal that it 

871, entitled "Screening Methods for a Single-Chip Multi- 55 is free. The scheduler adds the PP to its free list, 
processor." During screening the bit -depth of the pixels in A printer imaging process interrupts the scheduler process 

the frame buffer is reduced to match that of the device. when the print is engine is ready for the next band. The 

Finally the screened frame buffer is sent to the printer device scheduler responds by initiating a transfer process for the 

for marking a page. corresponding buffer. When the transfer is completed, the 

Throughout this process there are several possible places 60 transfer process interrupts the scheduler and the scheduler 

to implement clipping. Clipping is the process where a puts the corresponding band buffer in its free list, 
graphics object is modified to fit within the desired view Note that initially all PPs and band buffers are on the free 

window by clipping it to the view window boundaries. list. It is advantageous to start the print engine when the first 

Clipping could occur at steps 28 when a polygon is clipped (p+k) bands are rasterized to the available (p+k) buffers, 

to result in new polygons or at step 32 during the polygon 65 When the printer process interrupts the scheduler for the 

decomposition process. It could also occur at step 44 during next band, it must be immediately available for certain 

can conversion. In this case, clipping is done on the fly, with printers that cannot stop once printing is initiated. For such 
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printers it may be advantageous to identify complex bands tasks such as color conversion (based on a task queue 

and pre-render them before the print-engine is started. created by the MP) and can also act as dedicated compres- 

The segmentation of graphical objects to their respective sors for accelerating such tasks as dictionary searches. The 
bands is termed "bucketization" as the graphical objects are MP also performs bucketization, which was discussed 
indexed into buckets/bins corresponding to each band. With 5 above. During this time, the other PPs, 64 and 66, perform 
reference to FIG. 1, bucketization can occur at points a, b, geometry processing and rasterization for the previous page, 
c, or d, in the raster image processing pipeline. Following a Another embodiment of the invention is to use a sort- 
classification scheme for #D parallel rendering presented in middle approach. Examples of this type of implementation 
S. Molnar, et al. "Sorting Classification of Parallel Render- are shown in FIGS. 3a-3c. The main processor performs 
ing" (IEEE Computer Graphics and Applications, Vol. 14, 10 language interpretation, generates paths and passes them to 
No. 4, July 1994,) points a, b, and c correspond to sort-first the geometry processing PPs, 60 and 62. This assignment 
so approaches and point d corresponds to a sort -middle can be a round-robin fashion or through a queue, where a 
approach. geometry processing P that is free picks up the next element 

Sort -first refers to schemes where graphics objects are in the task list 68. The geometry PPs, 60 and 62, then 

sorted into their respective bands during geometry process- 15 generate the primitives list and pass it to the rasterization 

ing. Sort-middle schemes redistribute or bucketize graphics PPs, 64 and 66. This could be done by the geometry PPs 

objects to their respective bands between geometry process- pushing the primitive to either one of the rasterizing PPs* 

ing and rasterization. Bucketization may involve explicit queue. The frame buffer 58 is divided into two bands. One 

clipping of graphics objects to the bands or such a clipping rasterizing PP handles the top baud, Band Buffer 0, and the 

could be performed on the fly during rendering when a PP 20 other PP handling the lower band, Band Buffer 1. 

draws an object subject to the clip boundary. I n a second embodiment, the geometry PPs 60 and 62 

In one embodiment of the invention, the raster image work on generating the primitives list for page N, 70 and the 

processing is done in a sort-first manner as shown in FIGS. rasterizing PPs 64 and 66 rasterize the display list for page 

2a and 2b. Referring to FIG. 1, points b and c are better N-l 72. This is shown in FIG. 3b. In this case the frame 

bucketization points as bucketization is significantly simpli- 25 buffer can be divided into multiple band buffers and the 

fied after transformation to device coordinate space in step virtual buffer approach discussed earlier can be used, unlike 

24. Although bucketization point c in FIG. 1 is shown after the previous case. 

clipping 28, it can also occur before. FIG. 3c shows a third method of the sort-middle embodi- 

Note that the bucketized paths or polygons, if bucketiza- 3Q ment of the invention. In this embodiment the PPs are time 

tion is at point c, have to be clipped to the band boundaries shared. All four PPs, with four PPs meant only as an 

when drawn. In one embodiment of the invention, it is the example, perform geometry processing to create the primi- 

responsibility of the PP processing the corresponding paths tives list for the current page 72. The primitives list is then 

to perform this clipping. A feature of the sort-first embodi- used by the same four PPs for rasterization, 

ment is that the same PP performs a significant amount of 35 During the rasterization part of the overall process, exem- 

geometry processing and all of the rasterization tasks on plified by FIG. 1, the PPs consume the graphics primitive list 

high level objects (paths or polygons). (the display list) corresponding to their assigned band, one 

A second scheme is a sort-middle scheme where primi- at a time. In the above implementations, for example, 

tives are first generated from the high-level objects and are trapezoids may be the primary primitives. The rendering 

then assigned to their respective bands, at point d in FIG. 1. 40 program on the PPs first reduces the trapezoids to guide 

In one embodiment of the invention, one set of PPs can tables. A guide table is an on-chip table that contains the base 

perform the primitive decomposition (geometry processing) address for each scan-line segment of the trapezoid and the 

while a different set may perform the rasterization depend- number of pixels in the segment. 

ing upon the bands to which a primitive is assigned. Transfers between the MP, the various PPs and memory 
One embodiment of a sort-first implementation using a 45 buffers are handled by some type of transfer controller (TC) 
single-chip multiprocessor is shown in FIG. 2a. In this (not shown). A TC is typically a sophisticated DMA con- 
embodiment, the master processor (MP) 52 performs lan- troller that can be programmed to handle block data 
guage interpretation and the coordinate transformations of transfers, such as handling packet transfer requests in a 
the resultant paths. The PPs 54 are used to accelerate the round-robin fashion. Once the guide table is generated, a 
language interpretation tasks, like dictionary searches, etc. 50 packet transfer request is submitted to the TC with the guide 
The PPs are also used to perform source data processing table pointer and the requested fill value. The TC translates 
tasks such as color conversion. The MP 52 can push such the guide table to the appropriate sequence control signals 
tasks onto a task queue 68 and the PPs 54 can consume the and loads the data and address buses with the appropriate 
task in the queue. The MP can then continue with its thread data to create the trapezoid in the band buffer, 
of execution. The paths generated in this process are 'buck- 55 The use of a single-chip multiprocessor with several 
etized* into their respective bands by the MP, which can parallel processors and a main processor allows overlapping 
possibly use the PPs for acceleration of the task. In the between the guide table processing and trapezoid pixel data 
second stage of processing, the paths are decomposed to transfers. The main processor may be on the same chip as the 
polygons, the polygons to primitives (trapezoids, etc.) and parallel processors or the main processor may be on one chip 
the primitives are rasterized. 60 and the parallel processors on another. For example, the PP 
In a second embodiment, the MP performs language could set up the guide table for the next transfer in one data 
interpretation and coordinate transformations for page N, as RAM, while the TC uses the guide table in another data 
shown in FIG. 2b. A subset of the PPs block 54 from FIG. RAM to affect the current transfer. The single -chip multi- 
2a are used as compressors by the MP. In this example, there processor used should have some means for allowing simul- 
are 4 PPs, 60, 62, 64, and 66, but the single-chip multipro- 65 laneous access to these independent data RAM. One 
cessor could have any number of PPs. In this example, the example of such an access means is through a crossbar 
first two PPs, 60 and 62 perform source data processing switch. 
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Thus, although there has been described to this point a 
particular embodiment of a method to allocate rasterized 
image processing among the processors on a single-chip 
multiprocessor, it is not intended that such specific refer- 
ences be considered as limitations upon the scope of this 5 
invention except in-so-far as set forth in the following 
claims. 

What is claimed is: 

1. A method for sorting and processing print data, com- 
prising the steps of: io 

providing a single semiconductor chip containing a main 
processor and plural parallel processors; 

performing language interpretation tasks on said print 
data using said main processor to create paths; ^ 

sending said language interpretation tasks from said main 
processor to said parallel processors to accelerate said 
language interpretation tasks by using said parallel 
processors to perform said language interpretation 
tasks to divide up said paths into their respective bands; 2Q 

decomposing said paths to primitives; and 

rasterizing said primitives using said parallel processors. 

2. A method for sorting and processing print data, com- 
prising the steps of: 

using a main processor of a single-chip multiprocessor to 25 
perform language interpretation tasks on said print data 
and to create paths for a current page of print data; 

selecting a subset of parallel processors on said single - 
chip multiprocessor and using said subset to accelerate 
said language interpretation tasks for said current page; 30 
and 
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performing geometry processing and rasterization pro- 
cessing on a previous page of print data with a second 
subset of said parallel processors in parallel with said 
main processor performing said language interpretation 
on said current page. 

3. A method of scheduling raster image processing tasks 
on a main processor and several parallel processors, com- 
prising the steps of: 

providing a plurality of parallel processors and a main 
processor, said main processor having a list of free 
parallel processors scheduler process running thereon; 

assigning free band buffers and parallel processors to 
bands of print data in sequence from said list of free 
parallel processors; 

providing a server process on a said parallel process and 
a band buffer; 

allowing said server process on said parallel processor 

assigned to said bands to rasterize primitives from said 

bands into said band buffer; 
interrupting said scheduler process when a said server 

process signals that it has completed said rasterization 

into one of said band buffers; 
adding said parallel processor sending said signal to said 

list of free parallel processors; and 
transferring contents of said rasterized band buffer to a 

print engine. 

* * * * * 
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